Identifying Business Sectors from Stock Price Fluctuations 
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Firms having similar business activities are correlated. We analyze two different cross-correlation 
matrices C constructed from (i) 30-min price fluctuations of 1000 US stocks for the 2-year period 
1994-95 and (ii) 1-day price fluctuations of 422 US stocks for the 35-year period 1962-96. We find 
that the eigenvectors of C corresponding to the largest eigenvalues allow us to partition the set of all 
stocks into distinct subsets. These subsets are similar to conventionally-identified business sectors, 
and are stable for extended periods of time. Using a set of coupled stochastic differential equations, 
we argue how correlations between stocks might arise. Finally, we demonstrate that the sectors we 
identify are useful for the practical goal of finding an investment which earns a given return without 
exposure to unnecessary risk. 
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The internal structure of a complex system manifests 
itself in correlations among its constituents. In physical 
systems, one relates correlations to basic interactions, but 
for the stock market problem Q|, the underlying 'interac- 
tions' are not known. Suppose that the change of stock 
prices can be visualized by the motion of point particles. 
Correlated particle motion can be pictured as "strings" 
connecting pairs of particles. Given only the records of 
the particle positions at equal time intervals, how can we 
identify the strings without 'seeing' them? One approach 
is to first calculate the cross-correlation matrix C whose 
elements Cij are the correlation-coefficients between the 
velocities of two particles i and j. The eigenvectors of 
C convey information about the collective modes of the 
system. 

What is the analog of the cross-correlation matrix C 
for the stock market problem? We define the cross- 
correlation matrix C with elements Cy = [(GiGj) — 
(Gi)(Gj)}/<Ji<jj, where cr, is the standard deviation of 
price fluctuations Gi{t) = In Si(t + At) — \nSi(t) (re- 
turns), Si(t) denotes the price of stock i = 1, . . . , N, and 
(. . .) denotes a time average over the period studied. To 
investigate correlations on different time scales, we ana- 
lyze (i) 30-min returns of N = 1000 largest stocks for the 
two-year period 1994-95 and (ii) daily returns of N = 422 
stocks for the 35-year period 1962-96 0. 

We first diagonalize C and rank-order its eigenvalues Xk 
such that Afe+i > A^; the corresponding eigenvectors are 
denoted u k . Next, we analyze the components of those 
deviating eigenvectors whose eigenvalues are larger than 
the upper bound for uncorrelated time series |^;|| . A di- 
rect examination of these eigenvectors, however, does not 
yield a straightforward interpretation of their economic 
relevance. To interpret their meaning, we note that the 
largest eigenvalue is an order of magnitude larger than 
the others, which constrains the remaining N — 1 eigen- 
values since Tr C = N. Thus, in order to analyze the 



contents of the deviating eigenvectors, we first remove 
the effect of the largest eigenvalue 

To analyze the information contained in the eigen- 
vectors u fc , we partition the 1000 stocks into groups la- 
beled £ = 1 . . . , 75 (comprising Ng stocks each) accord- 
ing to the first two digits of their Standard Industrial 
Classification (SIC) code, which classifies major industry 
groups. We define a projection matrix P, with elements 
Pu = l/Ni if stock i belongs to group I and Pa = oth- 
erwise. For each deviating eigenvector u k , we compute 



the contribution Xj? = J2i=i [ u i\ 2 °f eacn industry 
group £. The above procedure of computing Xf is analo- 
gous to the analysis of wave functions in disordered sys- 
tems, where one calculates the probability of finding a 
particle in a given region. 

Figure [l] shows X\ for ten largest eigenvectors after 
excluding the influence of the largest eigenvalue. The 
contribution Xf" shows several industries. We examine 
the significant contributors and find mainly stocks with 
large market capitalization [Fig. 0]. We analyze Xjf for 
the remainder of the deviating eigenvectors and find a 
significant 'peak' at distinct values of the SIC code - 
suggesting that these eigenvectors correspond to distinct 
industry groups || . 

One deviating eigenvector u 995 displays large values of 
X k for firms belonging to the heavy construction industry 
and telecommunications industry. In addition, an exam- 
ination of these firms shows significant business activity 
in Latin America. Another case corresponds to eigenvec- 
tors u 996 and u 997 , both of which contain a mixture of 
stocks of gold-mining firms and banking firms. We find 
that these two sectors separate when we compute the 
symmetric and antisymmetric combinations l/v / 2( u 996 
± u 997 ). The remainder of the deviating eigenvectors dis- 
play technology, metal mining, banking, petroleum refin- 
ing, auto manufacturing, drug manufacturing, and paper 
manufacturing firms [Fig. [TJ. 



1 



We next focus on the interpretation of the largest 
eigenvalue Aiooo- Using the eigenvector u 1000 , we con- 
struct a time series G 1000 (t) = uj 000 Gt(t). We 
then compare G 1000 (i) with the returns Gsp(t) of the 
S&P 500 index, a benchmark for gauging the perfor- 
mance of entire US stock market. Regressing G 1000 (t) 
against Gsp(t) shows a scatter around a linear fit with 
slope 0.85 ± 0.09 [Fig. |[. Thus, we interpret the eigen- 
vector u 1000 as the influence of the entire market, that is 
common for all stocks 

Next, we examine whether the eigenvectors u k corre- 
sponding to business sectors remain stable in time. Parti- 
tioning the year 1994 into two 6-month periods, A and B, 
we calculate the corresponding eigenvectors ua and of 
the cross-correlation matrices and quantify the time sta- 
bility by calculating the magnitude of the scalar products 
dj = | u A u 3 B \ for the 20 largest eigenvalues. Perfect time 
stability would mean Oij = 8%j. For i = 1000, we find 
On = 0.93 — indicating almost perfect stability. We find 
that On decreases as i decreases from 1000 [Fig. Q. Ex- 
tending this analysis to daily returns using database (ii) 
shows that the eigenvectors corresponding to the largest 
3 eigenvalues are stable for as many as 10 years. 

How can we explain correlations that are stable in 
time? In physical systems, one starts from the inter- 
actions between the constituents, and then relates inter- 
actions to correlated "modes" of the system. In economic 
systems, we ask if "interactions" give rise to the corre- 
lated behavior. Interactions can arise when two compa- 
nies are doing business together, or compete for the same 
market. To study if the correlations can be explained 
through interactions 0, we model stock price dynamics 
by a differential equation which describes the 'in- 

stantaneous" returns gi(t) — tj \nSi(t) as a random walk 
with interactions Jy 

nd t gi{t) = -rgt(t) + V J l3 g 3 {t) + -&(t) . (1) 

J 

Here, £i(t) are Gaussian random variables with correla- 
tion function = 5ijTid(t — t'), and t, are re- 
laxation times of the (ff»(t)<7»(t+T)) correlation function. 
The return Gi at a finite time interval At is given by the 
integral of gi over At. 

Calculating time-dependent correlation functions for 
the gi, we find that correlations caused by interactions 
are accompanied by a phenomenon analogous to "criti- 
cal slowing down" . The market time series G 1000 (i) — as 
well as time series constructed similarly for other deviat- 
ing eigenvectors — have considerably larger correlation 
times than a time series constructed out of a random 
eigenvector, consistent with the hypothesis that correla- 
tions between firms are caused by interactions. 

The eigenvectors that we interpret as defining business 
sectors also have relevance to the practical goal of finding 
an investment which earns a given return without expo- 
sure to unnecessary risk ( "optimal portfolio" ) . Risk can 
be reduced by diversification of investment into indepen- 



dently fluctuating groups of stocks, such as the mutually 
uncorrelated business sectors that we find. Since the sec- 
tors (eigenvectors) are stable in time, we expect the ratio 
of risk to return of the portfolios constructed from them 
to be stable. 

Consider a portfolio P(t) = J2i=i WiSi(t), where Wi is 
the fraction of wealth invested in stock i. The portfolio 
return is given by R = ^2^ =1 WiGi. The risk in hold- 
ing the portfolio P(t) can be quantified by the variance 
D 2 = J2i=i Ej=i w i w j Cij <Ji<Jj, where Oi is the standard 
deviation of Gi [n0| . In order to find an optimal portfolio, 
we minimize D^mder the constraints that the portfo- 
lio return is some fixed value R and y^_, Wi = 1. We 
thereby obtain a family of optimal portfolios, which we 
represent by plotting R as a function of risk D 2 [Fig. [|. 

To find the effect of randomness of the Cy on optimal 
portfolio selection, we partition the time period 1994-95 
into two 1-year periods. Using the cross-correlation ma- 
trix C94 for 1994, and Gi for 1995 pH] , we construct a 
family of optimal portfolios and plot R as a function of 
the predicted risk D 2 for 1995 [Fig. ||(a)]. For this fam- 
ily of portfolios, we also compute the risk D 2 realized 
during 1995 using C 95 [Fig. |(a)] . We find that the pre- 
dicted risk is significantly smaller than the realized risk: 
[D 2 - Dl]/Dl r> 170%. 

Since the meaningful information in C is contained in 
the deviating eigenvectors that define business sectors, we 
construct a 'filtered' correlation matrix C, by retaining 
only the deviating eigenvectors . We repeat the above 
calculations for finding the optimal portfolio using C in- 
stead of C. Figure ||(b) shows that the realized risk is now 
much closer to the predicted risk: [D 2 - D 2 ]/D 2 ss 25%. 
Thus, the optimal portfolios constructed using C are sig- 
nificantly more stable in time. 

In summary, given only the change in price of a stock, 
and no additional information about the stock, we can 
partition the set of all 10 3 stocks studied into sub- 
sets whose identities correspond well to conventionally- 
identified sectors of economic activity. The sector corre- 
lations are stable in time and can be used for the con- 
struction of optimal portfolios with a stable ratio of risk 
to return. 

We thank J. -P. Bouchaud, P. Cizeau, E. Derman, 
X. Gabaix, J. Hill, M. Janjusevic, R. N. Mantegna, 
M. Potters, L. Viceira, J. Zou, and especially L. A. 
N. Amaral for stimulating discussions, and DFG grant 
ROl-1/2447 for financial support. Our results on the 
applications to portfolio selection were presented at the 
APS March 2000 meeting by one of us (BR) and inde- 
pendently by P. Cizeau. 
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1. Contribution X\ to industry sector i of eigenvec- 
tor u K for the deviating eigenvectors shows marked peaks at 
distinct values of SIC code, for all but u 999 which contains 
stocks with large capitalizations as significant contributors. 
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FIG. 2. All 10 3 eigenvector components of u 999 plotted 
against market capitalization (in units of US Dollars) shows 
that large firms contribute more than small firms. The 
straight line, which shows a logarithmic fit, is a guide to the 
eye. 



FIG. 4. Comparison of eigenvectors for different time peri- 
ods A (first half of 1994) and B (second half of 1994) by means 
of their scalar product Oij, represented on a greyscale, where 
zero (black) corresponds to no overlap, and white (one) to 
perfect overlap. Note that the eigenvectors corresponding to 
the 4 largest eigenvalues have a large degree of time stability. 
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FIG. 3. S&P 500 returns Gsp(t) regressed against the re- 
turn G 1000 (t) of the portfolio defined by the eigenvector u 1000 . 
Both axes are scaled by their respective standard deviations. 
A linear regression yields a slope 0.85 ± 0.09, showing a large 
degree of correlation. 
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FIG. 5. (a) Portfolio return R as a function of risk D 2 for 
the family of optimal portfolios (without a risk-free asset) 
constructed from the original matrix C H]. The top curve 
shows the predicted risk Dp in 1995 of the family of optimal 
portfolios for a given return, calculated using 30-min returns 
for 1995 and the correlation matrix C94 for 1994. For the 
same family of portfolios, the bottom curve shows the real- 
ized risk D 2 calculated using the correlation matrix C95 for 
1995. These two curves differ by a factor of D 2 / D 2 , » 2.7. 
(b) Risk-return relationship for the optimal portfolios con- 
structed using the filtered correlation matrix C'. The top 
curve shows the predicted risk Dp in 1995 for the family of 
optimal portfolios for a given return, calculated using the fil- 
tered correlation matrix C 94 . The bottom curve shows the 
realized risk D 2 for the same family of portfolios computed 
using C95. The predicted risk is now closer to the realized 
risk: D 2 / D 2 p « 1.25. For the same family of optimal portfo- 
lios, the dashed curve shows the realized risk computed using 
the original correlation matrix C95: D 2 /Dp w 1.3. 
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